-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[VectorCombine] Combine extract/insert from vector #115213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-vectorizers Author: hanbeom (ParkHanbum) Changesinsert (DstVec, (extract (binop), ExtIdx), InsIdx) --> shuffl (DstVec, (binop), Mask) This commit combines extract/insert with BinaryOperation on a vector into Shuffle+BinaryOperation with vector. Patch is 24.73 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/115213.diff 5 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
index 58145c7e3c5913..0ccd535303686d 100644
--- a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+++ b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
@@ -106,6 +106,7 @@ class VectorCombine {
Instruction &I);
bool foldExtractExtract(Instruction &I);
bool foldInsExtFNeg(Instruction &I);
+ bool foldInsExtOfBinOpShuffle(Instruction &I);
bool foldBitcastShuffle(Instruction &I);
bool scalarizeBinopOrCmp(Instruction &I);
bool scalarizeVPIntrinsic(Instruction &I);
@@ -2678,6 +2679,52 @@ bool VectorCombine::shrinkType(llvm::Instruction &I) {
return true;
}
+/// insert (DstVec, (extract (binop), ExtIdx), InsIdx) -->
+/// shuffl (DstVec, (binop), Mask)
+bool VectorCombine::foldInsExtOfBinOpShuffle(Instruction &I) {
+ Value *DstVec;
+ BinaryOperator *BO;
+ uint64_t ExtIdx, InsIdx;
+ if (!match(&I, m_InsertElt(
+ m_Value(DstVec),
+ m_OneUse(m_ExtractElt(m_BinOp(BO), m_ConstantInt(ExtIdx))),
+ m_ConstantInt(InsIdx))))
+ return false;
+
+ if (!isSafeToSpeculativelyExecute(BO))
+ return false;
+
+ auto *VecTy = cast<FixedVectorType>(I.getType());
+ if (BO->getType() != VecTy)
+ return false;
+
+ unsigned NumElts = VecTy->getNumElements();
+ if (ExtIdx >= NumElts)
+ return false;
+
+ SmallVector<int> Mask(NumElts);
+ std::iota(Mask.begin(), Mask.end(), 0);
+ Mask[InsIdx] = ExtIdx + NumElts;
+ // Cost
+ ExtractElementInst *Ext;
+ if ((Ext = dyn_cast<ExtractElementInst>(I.getOperand(0))) == nullptr)
+ Ext = dyn_cast<ExtractElementInst>(I.getOperand(1));
+
+ TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
+ InstructionCost OldCost =
+ TTI.getVectorInstrCost(*Ext, VecTy, CostKind, ExtIdx);
+ InstructionCost NewCost =
+ TTI.getShuffleCost(TargetTransformInfo::SK_Select, VecTy, Mask);
+
+ if (OldCost < NewCost)
+ return false;
+
+ Value *Shuf = Builder.CreateShuffleVector(DstVec, BO, Mask);
+ replaceValue(I, *Shuf);
+
+ return true;
+}
+
/// This is the entry point for all transforms. Pass manager differences are
/// handled in the callers of this function.
bool VectorCombine::run() {
@@ -2734,6 +2781,7 @@ bool VectorCombine::run() {
switch (Opcode) {
case Instruction::InsertElement:
MadeChange |= foldInsExtFNeg(I);
+ MadeChange |= foldInsExtOfBinOpShuffle(I);
break;
case Instruction::ShuffleVector:
MadeChange |= foldShuffleOfBinops(I);
diff --git a/llvm/test/Transforms/VectorCombine/X86/extract-binop-inseltpoison.ll b/llvm/test/Transforms/VectorCombine/X86/extract-binop-inseltpoison.ll
index 3d69f15fc5f249..e5880c93a9020f 100644
--- a/llvm/test/Transforms/VectorCombine/X86/extract-binop-inseltpoison.ll
+++ b/llvm/test/Transforms/VectorCombine/X86/extract-binop-inseltpoison.ll
@@ -417,12 +417,18 @@ define float @ext14_ext15_fmul_v16f32(<16 x float> %x) {
}
define <4 x float> @ins_bo_ext_ext(<4 x float> %a, <4 x float> %b) {
-; CHECK-LABEL: @ins_bo_ext_ext(
-; CHECK-NEXT: [[SHIFT:%.*]] = shufflevector <4 x float> [[A:%.*]], <4 x float> poison, <4 x i32> <i32 poison, i32 poison, i32 poison, i32 2>
-; CHECK-NEXT: [[TMP1:%.*]] = fadd <4 x float> [[SHIFT]], [[A]]
-; CHECK-NEXT: [[A23:%.*]] = extractelement <4 x float> [[TMP1]], i64 3
-; CHECK-NEXT: [[V3:%.*]] = insertelement <4 x float> [[B:%.*]], float [[A23]], i32 3
-; CHECK-NEXT: ret <4 x float> [[V3]]
+; SSE-LABEL: @ins_bo_ext_ext(
+; SSE-NEXT: [[SHIFT:%.*]] = shufflevector <4 x float> [[A:%.*]], <4 x float> poison, <4 x i32> <i32 poison, i32 poison, i32 poison, i32 2>
+; SSE-NEXT: [[TMP1:%.*]] = fadd <4 x float> [[SHIFT]], [[A]]
+; SSE-NEXT: [[A23:%.*]] = extractelement <4 x float> [[TMP1]], i64 3
+; SSE-NEXT: [[V3:%.*]] = insertelement <4 x float> [[B:%.*]], float [[A23]], i32 3
+; SSE-NEXT: ret <4 x float> [[V3]]
+;
+; AVX-LABEL: @ins_bo_ext_ext(
+; AVX-NEXT: [[SHIFT:%.*]] = shufflevector <4 x float> [[A:%.*]], <4 x float> poison, <4 x i32> <i32 poison, i32 poison, i32 poison, i32 2>
+; AVX-NEXT: [[TMP1:%.*]] = fadd <4 x float> [[SHIFT]], [[A]]
+; AVX-NEXT: [[V3:%.*]] = shufflevector <4 x float> [[B:%.*]], <4 x float> [[TMP1]], <4 x i32> <i32 0, i32 1, i32 2, i32 7>
+; AVX-NEXT: ret <4 x float> [[V3]]
;
%a2 = extractelement <4 x float> %a, i32 2
%a3 = extractelement <4 x float> %a, i32 3
@@ -435,13 +441,21 @@ define <4 x float> @ins_bo_ext_ext(<4 x float> %a, <4 x float> %b) {
; but it is likely that extracting from index 3 is the better option.
define <4 x float> @ins_bo_ext_ext_uses(<4 x float> %a, <4 x float> %b) {
-; CHECK-LABEL: @ins_bo_ext_ext_uses(
-; CHECK-NEXT: [[SHIFT:%.*]] = shufflevector <4 x float> [[A:%.*]], <4 x float> poison, <4 x i32> <i32 poison, i32 poison, i32 3, i32 poison>
-; CHECK-NEXT: [[TMP1:%.*]] = fadd <4 x float> [[A]], [[SHIFT]]
-; CHECK-NEXT: [[A23:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
-; CHECK-NEXT: call void @use_f32(float [[A23]])
-; CHECK-NEXT: [[V3:%.*]] = insertelement <4 x float> [[B:%.*]], float [[A23]], i32 3
-; CHECK-NEXT: ret <4 x float> [[V3]]
+; SSE-LABEL: @ins_bo_ext_ext_uses(
+; SSE-NEXT: [[SHIFT:%.*]] = shufflevector <4 x float> [[A:%.*]], <4 x float> poison, <4 x i32> <i32 poison, i32 poison, i32 3, i32 poison>
+; SSE-NEXT: [[TMP1:%.*]] = fadd <4 x float> [[A]], [[SHIFT]]
+; SSE-NEXT: [[A23:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
+; SSE-NEXT: call void @use_f32(float [[A23]])
+; SSE-NEXT: [[V3:%.*]] = insertelement <4 x float> [[B:%.*]], float [[A23]], i32 3
+; SSE-NEXT: ret <4 x float> [[V3]]
+;
+; AVX-LABEL: @ins_bo_ext_ext_uses(
+; AVX-NEXT: [[SHIFT:%.*]] = shufflevector <4 x float> [[A:%.*]], <4 x float> poison, <4 x i32> <i32 poison, i32 poison, i32 3, i32 poison>
+; AVX-NEXT: [[TMP1:%.*]] = fadd <4 x float> [[A]], [[SHIFT]]
+; AVX-NEXT: [[A23:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
+; AVX-NEXT: call void @use_f32(float [[A23]])
+; AVX-NEXT: [[V3:%.*]] = shufflevector <4 x float> [[B:%.*]], <4 x float> [[TMP1]], <4 x i32> <i32 0, i32 1, i32 2, i32 6>
+; AVX-NEXT: ret <4 x float> [[V3]]
;
%a2 = extractelement <4 x float> %a, i32 2
%a3 = extractelement <4 x float> %a, i32 3
@@ -452,22 +466,37 @@ define <4 x float> @ins_bo_ext_ext_uses(<4 x float> %a, <4 x float> %b) {
}
define <4 x float> @PR34724(<4 x float> %a, <4 x float> %b) {
-; CHECK-LABEL: @PR34724(
-; CHECK-NEXT: [[A0:%.*]] = extractelement <4 x float> [[A:%.*]], i32 0
-; CHECK-NEXT: [[A1:%.*]] = extractelement <4 x float> [[A]], i32 1
-; CHECK-NEXT: [[SHIFT:%.*]] = shufflevector <4 x float> [[A]], <4 x float> poison, <4 x i32> <i32 poison, i32 poison, i32 3, i32 poison>
-; CHECK-NEXT: [[TMP1:%.*]] = fadd <4 x float> [[A]], [[SHIFT]]
-; CHECK-NEXT: [[A23:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
-; CHECK-NEXT: [[SHIFT1:%.*]] = shufflevector <4 x float> [[B:%.*]], <4 x float> poison, <4 x i32> <i32 1, i32 poison, i32 poison, i32 poison>
-; CHECK-NEXT: [[TMP2:%.*]] = fadd <4 x float> [[B]], [[SHIFT1]]
-; CHECK-NEXT: [[B01:%.*]] = extractelement <4 x float> [[TMP2]], i32 0
-; CHECK-NEXT: [[SHIFT2:%.*]] = shufflevector <4 x float> [[B]], <4 x float> poison, <4 x i32> <i32 poison, i32 poison, i32 poison, i32 2>
-; CHECK-NEXT: [[TMP3:%.*]] = fadd <4 x float> [[SHIFT2]], [[B]]
-; CHECK-NEXT: [[B23:%.*]] = extractelement <4 x float> [[TMP3]], i64 3
-; CHECK-NEXT: [[V1:%.*]] = insertelement <4 x float> poison, float [[A23]], i32 1
-; CHECK-NEXT: [[V2:%.*]] = insertelement <4 x float> [[V1]], float [[B01]], i32 2
-; CHECK-NEXT: [[V3:%.*]] = insertelement <4 x float> [[V2]], float [[B23]], i32 3
-; CHECK-NEXT: ret <4 x float> [[V3]]
+; SSE-LABEL: @PR34724(
+; SSE-NEXT: [[A0:%.*]] = extractelement <4 x float> [[A:%.*]], i32 0
+; SSE-NEXT: [[A1:%.*]] = extractelement <4 x float> [[A]], i32 1
+; SSE-NEXT: [[SHIFT:%.*]] = shufflevector <4 x float> [[A]], <4 x float> poison, <4 x i32> <i32 poison, i32 poison, i32 3, i32 poison>
+; SSE-NEXT: [[TMP1:%.*]] = fadd <4 x float> [[A]], [[SHIFT]]
+; SSE-NEXT: [[A23:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
+; SSE-NEXT: [[SHIFT1:%.*]] = shufflevector <4 x float> [[B:%.*]], <4 x float> poison, <4 x i32> <i32 1, i32 poison, i32 poison, i32 poison>
+; SSE-NEXT: [[TMP2:%.*]] = fadd <4 x float> [[B]], [[SHIFT1]]
+; SSE-NEXT: [[B01:%.*]] = extractelement <4 x float> [[TMP2]], i32 0
+; SSE-NEXT: [[SHIFT2:%.*]] = shufflevector <4 x float> [[B]], <4 x float> poison, <4 x i32> <i32 poison, i32 poison, i32 poison, i32 2>
+; SSE-NEXT: [[TMP3:%.*]] = fadd <4 x float> [[SHIFT2]], [[B]]
+; SSE-NEXT: [[B23:%.*]] = extractelement <4 x float> [[TMP3]], i64 3
+; SSE-NEXT: [[V1:%.*]] = insertelement <4 x float> poison, float [[A23]], i32 1
+; SSE-NEXT: [[V2:%.*]] = insertelement <4 x float> [[V1]], float [[B01]], i32 2
+; SSE-NEXT: [[V3:%.*]] = insertelement <4 x float> [[V2]], float [[B23]], i32 3
+; SSE-NEXT: ret <4 x float> [[V3]]
+;
+; AVX-LABEL: @PR34724(
+; AVX-NEXT: [[A0:%.*]] = extractelement <4 x float> [[A:%.*]], i32 0
+; AVX-NEXT: [[A1:%.*]] = extractelement <4 x float> [[A]], i32 1
+; AVX-NEXT: [[SHIFT:%.*]] = shufflevector <4 x float> [[A]], <4 x float> poison, <4 x i32> <i32 poison, i32 poison, i32 3, i32 poison>
+; AVX-NEXT: [[TMP1:%.*]] = fadd <4 x float> [[A]], [[SHIFT]]
+; AVX-NEXT: [[SHIFT1:%.*]] = shufflevector <4 x float> [[B:%.*]], <4 x float> poison, <4 x i32> <i32 1, i32 poison, i32 poison, i32 poison>
+; AVX-NEXT: [[TMP2:%.*]] = fadd <4 x float> [[B]], [[SHIFT1]]
+; AVX-NEXT: [[B01:%.*]] = extractelement <4 x float> [[TMP2]], i32 0
+; AVX-NEXT: [[SHIFT2:%.*]] = shufflevector <4 x float> [[B]], <4 x float> poison, <4 x i32> <i32 poison, i32 poison, i32 poison, i32 2>
+; AVX-NEXT: [[TMP3:%.*]] = fadd <4 x float> [[SHIFT2]], [[B]]
+; AVX-NEXT: [[V1:%.*]] = shufflevector <4 x float> poison, <4 x float> [[TMP1]], <4 x i32> <i32 0, i32 6, i32 2, i32 3>
+; AVX-NEXT: [[V2:%.*]] = insertelement <4 x float> [[V1]], float [[B01]], i32 2
+; AVX-NEXT: [[V3:%.*]] = shufflevector <4 x float> [[V2]], <4 x float> [[TMP3]], <4 x i32> <i32 0, i32 1, i32 2, i32 7>
+; AVX-NEXT: ret <4 x float> [[V3]]
;
%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1
diff --git a/llvm/test/Transforms/VectorCombine/X86/extract-binop.ll b/llvm/test/Transforms/VectorCombine/X86/extract-binop.ll
index 52f7cd859a1ab1..49a636c1f804d0 100644
--- a/llvm/test/Transforms/VectorCombine/X86/extract-binop.ll
+++ b/llvm/test/Transforms/VectorCombine/X86/extract-binop.ll
@@ -417,12 +417,18 @@ define float @ext14_ext15_fmul_v16f32(<16 x float> %x) {
}
define <4 x float> @ins_bo_ext_ext(<4 x float> %a, <4 x float> %b) {
-; CHECK-LABEL: @ins_bo_ext_ext(
-; CHECK-NEXT: [[SHIFT:%.*]] = shufflevector <4 x float> [[A:%.*]], <4 x float> poison, <4 x i32> <i32 poison, i32 poison, i32 poison, i32 2>
-; CHECK-NEXT: [[TMP1:%.*]] = fadd <4 x float> [[SHIFT]], [[A]]
-; CHECK-NEXT: [[A23:%.*]] = extractelement <4 x float> [[TMP1]], i64 3
-; CHECK-NEXT: [[V3:%.*]] = insertelement <4 x float> [[B:%.*]], float [[A23]], i32 3
-; CHECK-NEXT: ret <4 x float> [[V3]]
+; SSE-LABEL: @ins_bo_ext_ext(
+; SSE-NEXT: [[SHIFT:%.*]] = shufflevector <4 x float> [[A:%.*]], <4 x float> poison, <4 x i32> <i32 poison, i32 poison, i32 poison, i32 2>
+; SSE-NEXT: [[TMP1:%.*]] = fadd <4 x float> [[SHIFT]], [[A]]
+; SSE-NEXT: [[A23:%.*]] = extractelement <4 x float> [[TMP1]], i64 3
+; SSE-NEXT: [[V3:%.*]] = insertelement <4 x float> [[B:%.*]], float [[A23]], i32 3
+; SSE-NEXT: ret <4 x float> [[V3]]
+;
+; AVX-LABEL: @ins_bo_ext_ext(
+; AVX-NEXT: [[SHIFT:%.*]] = shufflevector <4 x float> [[A:%.*]], <4 x float> poison, <4 x i32> <i32 poison, i32 poison, i32 poison, i32 2>
+; AVX-NEXT: [[TMP1:%.*]] = fadd <4 x float> [[SHIFT]], [[A]]
+; AVX-NEXT: [[V3:%.*]] = shufflevector <4 x float> [[B:%.*]], <4 x float> [[TMP1]], <4 x i32> <i32 0, i32 1, i32 2, i32 7>
+; AVX-NEXT: ret <4 x float> [[V3]]
;
%a2 = extractelement <4 x float> %a, i32 2
%a3 = extractelement <4 x float> %a, i32 3
@@ -435,13 +441,21 @@ define <4 x float> @ins_bo_ext_ext(<4 x float> %a, <4 x float> %b) {
; but it is likely that extracting from index 3 is the better option.
define <4 x float> @ins_bo_ext_ext_uses(<4 x float> %a, <4 x float> %b) {
-; CHECK-LABEL: @ins_bo_ext_ext_uses(
-; CHECK-NEXT: [[SHIFT:%.*]] = shufflevector <4 x float> [[A:%.*]], <4 x float> poison, <4 x i32> <i32 poison, i32 poison, i32 3, i32 poison>
-; CHECK-NEXT: [[TMP1:%.*]] = fadd <4 x float> [[A]], [[SHIFT]]
-; CHECK-NEXT: [[A23:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
-; CHECK-NEXT: call void @use_f32(float [[A23]])
-; CHECK-NEXT: [[V3:%.*]] = insertelement <4 x float> [[B:%.*]], float [[A23]], i32 3
-; CHECK-NEXT: ret <4 x float> [[V3]]
+; SSE-LABEL: @ins_bo_ext_ext_uses(
+; SSE-NEXT: [[SHIFT:%.*]] = shufflevector <4 x float> [[A:%.*]], <4 x float> poison, <4 x i32> <i32 poison, i32 poison, i32 3, i32 poison>
+; SSE-NEXT: [[TMP1:%.*]] = fadd <4 x float> [[A]], [[SHIFT]]
+; SSE-NEXT: [[A23:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
+; SSE-NEXT: call void @use_f32(float [[A23]])
+; SSE-NEXT: [[V3:%.*]] = insertelement <4 x float> [[B:%.*]], float [[A23]], i32 3
+; SSE-NEXT: ret <4 x float> [[V3]]
+;
+; AVX-LABEL: @ins_bo_ext_ext_uses(
+; AVX-NEXT: [[SHIFT:%.*]] = shufflevector <4 x float> [[A:%.*]], <4 x float> poison, <4 x i32> <i32 poison, i32 poison, i32 3, i32 poison>
+; AVX-NEXT: [[TMP1:%.*]] = fadd <4 x float> [[A]], [[SHIFT]]
+; AVX-NEXT: [[A23:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
+; AVX-NEXT: call void @use_f32(float [[A23]])
+; AVX-NEXT: [[V3:%.*]] = shufflevector <4 x float> [[B:%.*]], <4 x float> [[TMP1]], <4 x i32> <i32 0, i32 1, i32 2, i32 6>
+; AVX-NEXT: ret <4 x float> [[V3]]
;
%a2 = extractelement <4 x float> %a, i32 2
%a3 = extractelement <4 x float> %a, i32 3
@@ -452,22 +466,37 @@ define <4 x float> @ins_bo_ext_ext_uses(<4 x float> %a, <4 x float> %b) {
}
define <4 x float> @PR34724(<4 x float> %a, <4 x float> %b) {
-; CHECK-LABEL: @PR34724(
-; CHECK-NEXT: [[A0:%.*]] = extractelement <4 x float> [[A:%.*]], i32 0
-; CHECK-NEXT: [[A1:%.*]] = extractelement <4 x float> [[A]], i32 1
-; CHECK-NEXT: [[SHIFT:%.*]] = shufflevector <4 x float> [[A]], <4 x float> poison, <4 x i32> <i32 poison, i32 poison, i32 3, i32 poison>
-; CHECK-NEXT: [[TMP1:%.*]] = fadd <4 x float> [[A]], [[SHIFT]]
-; CHECK-NEXT: [[A23:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
-; CHECK-NEXT: [[SHIFT1:%.*]] = shufflevector <4 x float> [[B:%.*]], <4 x float> poison, <4 x i32> <i32 1, i32 poison, i32 poison, i32 poison>
-; CHECK-NEXT: [[TMP2:%.*]] = fadd <4 x float> [[B]], [[SHIFT1]]
-; CHECK-NEXT: [[B01:%.*]] = extractelement <4 x float> [[TMP2]], i32 0
-; CHECK-NEXT: [[SHIFT2:%.*]] = shufflevector <4 x float> [[B]], <4 x float> poison, <4 x i32> <i32 poison, i32 poison, i32 poison, i32 2>
-; CHECK-NEXT: [[TMP3:%.*]] = fadd <4 x float> [[SHIFT2]], [[B]]
-; CHECK-NEXT: [[B23:%.*]] = extractelement <4 x float> [[TMP3]], i64 3
-; CHECK-NEXT: [[V1:%.*]] = insertelement <4 x float> undef, float [[A23]], i32 1
-; CHECK-NEXT: [[V2:%.*]] = insertelement <4 x float> [[V1]], float [[B01]], i32 2
-; CHECK-NEXT: [[V3:%.*]] = insertelement <4 x float> [[V2]], float [[B23]], i32 3
-; CHECK-NEXT: ret <4 x float> [[V3]]
+; SSE-LABEL: @PR34724(
+; SSE-NEXT: [[A0:%.*]] = extractelement <4 x float> [[A:%.*]], i32 0
+; SSE-NEXT: [[A1:%.*]] = extractelement <4 x float> [[A]], i32 1
+; SSE-NEXT: [[SHIFT:%.*]] = shufflevector <4 x float> [[A]], <4 x float> poison, <4 x i32> <i32 poison, i32 poison, i32 3, i32 poison>
+; SSE-NEXT: [[TMP1:%.*]] = fadd <4 x float> [[A]], [[SHIFT]]
+; SSE-NEXT: [[A23:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
+; SSE-NEXT: [[SHIFT1:%.*]] = shufflevector <4 x float> [[B:%.*]], <4 x float> poison, <4 x i32> <i32 1, i32 poison, i32 poison, i32 poison>
+; SSE-NEXT: [[TMP2:%.*]] = fadd <4 x float> [[B]], [[SHIFT1]]
+; SSE-NEXT: [[B01:%.*]] = extractelement <4 x float> [[TMP2]], i32 0
+; SSE-NEXT: [[SHIFT2:%.*]] = shufflevector <4 x float> [[B]], <4 x float> poison, <4 x i32> <i32 poison, i32 poison, i32 poison, i32 2>
+; SSE-NEXT: [[TMP3:%.*]] = fadd <4 x float> [[SHIFT2]], [[B]]
+; SSE-NEXT: [[B23:%.*]] = extractelement <4 x float> [[TMP3]], i64 3
+; SSE-NEXT: [[V1:%.*]] = insertelement <4 x float> undef, float [[A23]], i32 1
+; SSE-NEXT: [[V2:%.*]] = insertelement <4 x float> [[V1]], float [[B01]], i32 2
+; SSE-NEXT: [[V3:%.*]] = insertelement <4 x float> [[V2]], float [[B23]], i32 3
+; SSE-NEXT: ret <4 x float> [[V3]]
+;
+; AVX-LABEL: @PR34724(
+; AVX-NEXT: [[A0:%.*]] = extractelement <4 x float> [[A:%.*]], i32 0
+; AVX-NEXT: [[A1:%.*]] = extractelement <4 x float> [[A]], i32 1
+; AVX-NEXT: [[SHIFT:%.*]] = shufflevector <4 x float> [[A]], <4 x float> poison, <4 x i32> <i32 poison, i32 poison, i32 3, i32 poison>
+; AVX-NEXT: [[TMP1:%.*]] = fadd <4 x float> [[A]], [[SHIFT]]
+; AVX-NEXT: [[SHIFT1:%.*]] = shufflevector <4 x float> [[B:%.*]], <4 x float> poison, <4 x i32> <i32 1, i32 poison, i32 poison, i32 poison>
+; AVX-NEXT: [[TMP2:%.*]] = fadd <4 x float> [[B]], [[SHIFT1]]
+; AVX-NEXT: [[B01:%.*]] = extractelement <4 x float> [[TMP2]], i32 0
+; AVX-NEXT: [[SHIFT2:%.*]] = shufflevector <4 x float> [[B]], <4 x float> poison, <4 x i32> <i32 poison, i32 poison, i32 poison, i32 2>
+; AVX-NEXT: [[TMP3:%.*]] = fadd <4 x float> [[SHIFT2]], [[B]]
+; AVX-NEXT: [[V1:%.*]] = shufflevector <4 x float> undef, <4 x float> [[TMP1]], <4 x i32> <i32 0, i32 6, i32 2, i32 3>
+; AVX-NEXT: [[V2:%.*]] = insertelement <4 x float> [[V1]], float [[B01]], i32 2
+; AVX-NEXT: [[V3:%.*]] = shufflevector <4 x float> [[V2]], <4 x float> [[TMP3]], <4 x i32> <i32 0, i32 1, i32 2, i32 7>
+; AVX-NEXT: ret <4 x float> [[V3]]
;
%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1
diff --git a/llvm/test/Transforms/VectorCombine/X86/load-inseltpoison.ll b/llvm/test/Transforms/VectorCombine/X86/load-inseltpoison.ll
index c4aba63568e2ff..e99e21641531ab 100644
--- a/llvm/test/Transforms/VectorCombine/X86/load-inseltpoison.ll
+++ b/llvm/test/Transforms/VectorCombine/X86/load-inseltpoison.ll
@@ -537,19 +537,32 @@ define <2 x float> @load_f32_insert_v2f32_asan(ptr align 16 dereferenceable(16)
declare ptr @getscaleptr()
define void @PR47558_multiple_use_load(ptr nocapture nonnull %resultptr, ptr nocapture nonnull readonly %opptr) nofree nosync {
-; CHECK-LABEL: @PR47558_multiple_use_load(
-; CHECK-NEXT: [[SCALEPTR:%.*]] = tail call nonnull align 16 dereferenceable(64) ptr @getscaleptr()
-; CHECK-NEXT: [[OP:%.*]] = load <2 x float>, ptr [[OPPTR:%.*]], align 4
-; CHECK-NEXT: [[SCALE:%.*]] = load float, ptr [[SCALEPTR]], align 16
-; CHECK-NEXT: [[T1:%.*]] = insertelement <2 x float> poison, float [[SCALE]], i32 0
-; CHECK-NEXT: [[T2:%.*]] = insertelement <2 x float> [[T1]], float [[SCALE]], i32 1
-; CHECK-NEXT: [[T3:%.*]] = fmul <2 x float> [[OP]], [[T2]]
-; CHECK-NEXT: [[T4:%.*]] = extractelement <2 x float> [[T3]], i32 0
-; CHECK-NEXT: [[RESULT0:%.*]] = insertelement <2 x float> poison, float [[T4]], i32 0
-; CHECK-NEXT: [[T5:%.*]] = extractelement <2 x float> [[T3]], i32 1
-; CHECK-NEXT: [[RESULT1:%.*]] = insertelement <2 x float> [[RESULT0]], float [[T5]], i32 1
-; CHECK-NEXT: store <2 x float> [[RESULT1]], ptr [[RESULTPTR:%.*]], align 8
-; CHECK-NEXT: ret void
+; SSE2-LABEL: @PR47558_multiple_use_load(
+; SSE2-NEXT: [[SCALEPTR:%.*]] = tail call nonnull align 16 dereferenceable(64) ptr @getscaleptr()
+; SSE2-NEXT: [[OP:%.*]] = load <2 x float>, ptr [[OPPTR:%.*]], align 4
+; SSE2-NEXT: [[SCALE:%.*]] = load float, ptr [[SCALEPTR]], align 16
+; SSE2-NEXT: [[T1:%.*]] = insertelement <2 x float> poison, float [[SCALE]], i32 0
+; SSE2-NEXT: [[T2:%.*]] = insertelement <2 x float> [[T1]], float [[SCALE]], i32 1
+; SSE2-NEXT: [[T3:%.*]] = fmul <2 x float> [[OP]], [[T2]]
+; SSE2-NEXT: [[T4:%.*]] = extractelement <2 x float> [[T3]], i32 0
+; SSE2-NEXT: [[RESULT0:%.*]] =...
[truncated]
|
| SmallVector<int> Mask(NumElts); | ||
| std::iota(Mask.begin(), Mask.end(), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please just use SmallVector (size_t Size, const T &Value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its needs the iota for the ascending shuffle mask - its not a splat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't catch up what you mean.
@mshockwave you mean it is better to use SmallVector with initial value? or it can be replace from std::iota?
@RKSimon what is mean splat?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A splat is when the same value is assigned to every element - but for this shuffle mask you need <0, 1, 2, 3, ..., N-1> which is what iota will give you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see! thakns for letting me know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my bad, I misunderstood what std::iota does
| Mask[InsIdx] = ExtIdx + NumElts; | ||
| // Cost | ||
| ExtractElementInst *Ext; | ||
| if ((Ext = dyn_cast<ExtractElementInst>(I.getOperand(0))) == nullptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!isa<ExtractElementInst>(I.getOperand(0))) ?
| if (!isSafeToSpeculativelyExecute(BO)) | ||
| return false; | ||
|
|
||
| auto *VecTy = cast<FixedVectorType>(I.getType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will crash if I has scalable vector type. You can limit this combine rule to fixed vector only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto *VecTy = dyn_cast<FixedVectorType>(I.getType());
if (!VecTy || BO->getType() != VecTy)
return false;
RKSimon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why you require the BinOp for this - I'd be tempted to just make this about a insertelement(X, extractelement(Y,C1), C2 -> shuffle X, Y fold.
| } | ||
|
|
||
| /// insert (DstVec, (extract (binop), ExtIdx), InsIdx) --> | ||
| /// shuffl (DstVec, (binop), Mask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shuffle
| if (!isSafeToSpeculativelyExecute(BO)) | ||
| return false; | ||
|
|
||
| auto *VecTy = cast<FixedVectorType>(I.getType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto *VecTy = dyn_cast<FixedVectorType>(I.getType());
if (!VecTy || BO->getType() != VecTy)
return false;
| SmallVector<int> Mask(NumElts); | ||
| std::iota(Mask.begin(), Mask.end(), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its needs the iota for the ascending shuffle mask - its not a splat.
5afe48e to
290e9c5
Compare
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
290e9c5 to
5ae995e
Compare
| return false; | ||
|
|
||
| unsigned NumElts = VecTy->getNumElements(); | ||
| if (ExtIdx >= NumElts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (ExtIdx >= NumElts || InsIdx >= NumElts)
| ExtractElementInst *Ext; | ||
| Ext = isa<ExtractElementInst>(I.getOperand(0)) | ||
| ? cast<ExtractElementInst>(I.getOperand(0)) | ||
| : cast<ExtractElementInst>(I.getOperand(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The match above has confirmed that I is an InsertElementInst - so this would be better as:
auto *Ins = cast<InsertElementInst>(&I):
auto *Ext = cast<ExtractElementInst>(I.getOperand(1)) ;
|
|
||
| TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput; | ||
| InstructionCost OldCost = | ||
| TTI.getVectorInstrCost(*Ext, VecTy, CostKind, ExtIdx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OldCost should account for the Ins and Ext:
InstructionCost OldCost =
TTI.getVectorInstrCost(*Ext, VecTy, CostKind, ExtIdx) +
TTI.getVectorInstrCost(*Ins, VecTy, CostKind, InsIdx);
| InstructionCost OldCost = | ||
| TTI.getVectorInstrCost(*Ext, VecTy, CostKind, ExtIdx); | ||
| InstructionCost NewCost = | ||
| TTI.getShuffleCost(TargetTransformInfo::SK_Select, VecTy, Mask); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SK_Select can only be used when InsIdx == ExtIdx - this needs to be SK_PermuteTwoSrc (improveShuffleKindFromMask is called internally by getShuffleCost and will convert it to SK_Select if valid)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with vectorcombine so I made a mistake, thanks for letting me know and I'll try harder.
| Value *DstVec, *SrcVec; | ||
| uint64_t ExtIdx, InsIdx; | ||
| if (!match(&I, m_InsertElt(m_Value(DstVec), | ||
| m_OneUse(m_ExtractElt(m_Value(SrcVec), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to remove the m_OneUse if we can account for it in the NewCost below by adding the ExtractElementInst cost back if its has multiple uses.
5ae995e to
78bf832
Compare
|
@ParkHanbum Please can you merge with trunk again to see if we can get the CI to run green ? |
insert (DstVec, (extract SrcVec, ExtIdx), InsIdx) --> shuffle (DstVec, SrcVec, Mask) This commit combines extract/insert on a vector into Shuffle with vector.
78bf832 to
362e704
Compare
RKSimon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - cheers!
|
@davemgreen any comments? |
|
Hi - I think so long as it is cost-modelled it should be OK. Our shuffle generation is not going to be optimal at times (if it doesn't go though perfect shuffle tables), but the costs should be equally high as far as I understand. |
insert (DstVec, (extract SrcVec, ExtIdx), InsIdx)
--> shuffle (DstVec, SrcVec, Mask)
This commit combines extract/insert on a vector into Shuffle with
vector.